two rows and two columns. Because the fourfold table provides the opportunity for some

particularly insightful calculations, it’s worth a chapter of its own.

In Chapter 14, you discover how the terminology used in epidemiologic studies is applied to

specifically formatted fourfold tables to calculate incidence and prevalence rates.

Looking for relationships between variables

Epidemiology and biostatistics are interested in causal inference, which means trying to figure out

what causes particular outcomes in biological research. While it is possible to look at the relationship

between two variables in a bivariate analysis, regression analysis is the part of statistics that enables

you to explore the relationship between multiple variables and one outcome in the same model so you

can evaluate their relative cause of the outcome. Here are some use-cases for regression:

You may want to know whether there’s a statistically significant association between one or more

variables and an outcome, even if there are other variables in the model. You may ask: Does being

overweight increase the likelihood of getting liver cancer? Or: Is exercising fewer hours per week

associated with higher blood pressure measurements? In answering both of those questions, you

may want to control other variables known to influence the outcome.

You may want to develop a formula for predicting the value of a variable from the observed values

of one or more other variables. For example, you may want to predict how long a newly diagnosed

cancer patient may survive based on their age, obesity status, and medical history.

You may be fitting a theoretical formula to some data to estimate one of the parameters appearing

in that formula. An example of such a problem is determining how fast the kidneys can remove a

drug from the body, which is called a terminal elimination rate constant. This can be estimated

from measurements of drug concentration in the blood taken at various times after taking a dose of

the drug.

Regression analysis can manage all these tasks and many more. Regression is so important in

biological research that all the chapters in Part 5 are focused on some aspect of regression.

If you have never learned correlation and regression analysis, read Chapter 15, which

introduces these topics. We cover simple straight-line regression in Chapter 16, which includes

one predictor variable. We extend that to cover multiple regression with more than one predictor

variable in Chapter 17. These three chapters deal with ordinary linear regression, where you’re

trying to predict the value of a numerical outcome variable from one or more other variables. An

example would be trying to predict mean blood hemoglobin concentration using variables like

age, blood pressure level, and Type II diabetes status. Ordinary linear regression uses a formula

that’s a simple summation of terms, each of which consists of a predictor variable multiplied by a

regression coefficient.

But in real-world biological and epidemiologic research, you encounter more complicated

relationships. Chapter 18 describes logistic regression, where the outcome is the occurrence or non-

occurrence of an event (such as being diagnosed with Type II diabetes), and you want to predict the

probability that the event will occur. You also find out about several other kinds of regression in